Large-Scale Text Similarity Computing with Spark

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Large Scale Environmental Data Processing with Apache Spark

Currently available environmental datasets are either manually constructed by professionals or automatically generated from the observations provided by sensing devices. Usually, the former are modelled and recorded with traditional general-purpose relational technologies, whereas the latter require more specific scientific array formats and tools. Declarative data processing technologies are a...

متن کامل

Large-Scale Online Expectation Maximization with Spark Streaming

Many “Big Data” applications in Machine Learning (ML) need to react quickly to large streams of incoming data. The standard paradigm nowadays is to run ML algorithms on frameworks designed for batch operations, such as MapReduce or Hadoop. By design, these frameworks are not a good match for low-latency applications. This is why we explore using a new, recently proposed model for large-scale st...

متن کامل

Large Scale Sentiment Analysis on Twitter with Spark

Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of informat...

متن کامل

Large-Scale Similarity Joins With Guarantees

The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on r...

متن کامل

Large Scale Text Analysis

We take an algorithmic and computational approach to the problem of providing patent recommendations, developing a web interface that allows users to upload their draft patent and returns a list of ranked relevant patents in real time. We develop scalable, distributed algorithms based on optimization techniques and sparse machine learning, with a focus on both accuracy and speed.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Grid and Distributed Computing

سال: 2016

ISSN: 2005-4262,2005-4262

DOI: 10.14257/ijgdc.2016.9.4.09